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Abstract 

Two models of bias in evolutionary mechanism is presented: one of bias in migration 
in Wright's island model and the other of bias in gene conversion among members of a 
multigene family in a panmictic population. The models have an identical diffusive limit 
for a large population size, where allele frequencies in each subpopulation of the n-island 
model are identical to allele frequencies in each locus of the n-gene family. The probability 
of fixation of a new mutant throughout the total population/gene family is obtained by 
the diffusion process. It is shown that the deviation of the probability of fixation from that 
without bias is proportional to one plus the number of migrants/converted alleles. For the 
island model with allclc-dcpcndcnt migration, an analogue of the coalescent genealogy, 
which we call the ancestral bias graph, is introduced. We present a recursion formula 
that can be used to compute the probability of obtaining a given sample. We apply our 
formula to data set of mouse histone gene family. It is suggested that bias in ectopic 
gene conversion, whose magnitude is orders of magnitude larger than that in allelic gene 
conversion, can be maintained in a population. Recently, evidence of the large impact of 
biased gene conversion on gene substitution is accumulating. The fact that the diffusion 
models for conversion bias and for migration bias are identical suggests that migration 
bias can also have large impact on genomic polymorphism and divergence in subdivided 
populations. 

Keywords: biased gene conversion, biased migration, diffusion model, ancestral process, 
biased voter model 
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1. Introduction 

Evolutionary mechanisms are frequently biased. Because a slight bias in an evolu- 
tionary mechanism could cause result in an apparent natural selection in neutral loci, a 
quantitative understanding effect of biases in evolutionary mechanisms is necessary for 
understanding molecular evolutionary mechanisms. 

Recently, evidence of bias in gene conversion, which is a recombination associated molec- 
ular drive that favors AT to GC mutations, is accumulating [HE]- Nagylaki [3] showed 
that the bias in allelic conversion is equivalent to directional selection. Regions of a genome 
that evolve rapidly are generally regarded as being under strong positive selection. How- 
ever, it was shown that many protein coding changes in the fastest changing genes of 
human genome are not a result of selection operating on the genes; rather, they result 
from biased fixation of AT to GC mutations [4j. It was also suggested that the ectopic 
gene conversion among members of multigene families under concerted evolution [5] is also 
biased. In histone paralogous genes obtained from humans and mice, it was found that 
gene copies that belong to subfamilies with very similar sequences (presumably undergo- 
ing gene conversion) have a high GC content than unique gene copies (presumably not 
undergoing gene conversion) [6]. 

In human populations, migrants are rarely a random sample of their source population. 
It was demonstrated that patrilocal populations, where men tend to stay at their birthplace 
while women move to their husband's birthplace, exhibit greater Y-chromosomal than 
mitochondria DNA genetic differentiation, and in matrilocal populations, where women 
tend to stay at their birthplace, the converse trend is observed [7J. In addition, differences 
between the social organization difference of herders and agriculturists could enhance the 
genetic differentiation [8]. It is fairly reasonable to assume that a migration rate depends 
on allelic types. If so, neutral evolution of subdivided population might not be seen as 
being under neutral evolution. The difference of the migration rates here is not necesarily 
intended to be a result of the difference in biological functions associate with the allelic 
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types. Consider a population founded as a mixture of two ancestral populations with very 
differentiated genetic backgrounds. If one ancestral population have higher tendency of 
migration than the other ancestral population, alleles specific to the former population 
should have higher migration rate than alleles specific to the latter population. 

Hence, bias in evolution is an intriguing issue. Nevertheless, a few efforts have been 
devoted to modeling such biases. In this study, we will discuss biases in two evolutionary 
processes: gene conversion among members of multigene family in a panmictic popu- 
lation and Wright's island model [9j with allele-dependent migration. We will present 
continuous-time Moran models for these biases. The diffusive limits of these models are 
identical, where allele frequencies in each subpopulation of the n-island model are identical 
to allele frequencies in each locus of the n-gene family. The probability of fixation of a 
new mutant throughout the total population/gene family is obtained from the diffusion 
process. Then, we descrive the island model with biased migration in terms of a biased 
voter model [lOj . By using a duality, we obtain a random graph, which is analogous to 
the coalescent genealogy [11^ [T2| 113] , We call this graph the ancestral bias graph. We 
present a recursion that can be used to approximate the probability of obtaining a given 
sample by simulating backwards along sample path of the ancestral bias graph. In the 
strong migration/conversion limit [13], the diffusion model is equivalent to that for direc- 
tional selection, and thus, the ancestral bias graph reduces to the ancestral selection graph 
|15j . By using the formula, we demonstrate quantifying biases in gene conversion among 
multigene families. 

2. Formulation 

Consider a subdivided population consists of n demes, where each deme is occupied 
by N haploid individuals and all pairs of demes can exchange migrants symmetrically 
(n-island model). The population evolves according to a continuous-time Moran model 
with revertible mutation. The sizes of the demes are kept constant by migrations. We 
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will limit the discussion to two allelic types, Ai and A2, and assume that the mutation 
between the two alleles are symmetric. The Moran model is a continuous-time Markov 
process in which individuals produce one offspring at a time. The type of offspring will be 
chosen according to the mutation process. The offspring will then replace an individual 
who is chosen at random from the same deme or the other demes. The offspring may 
replace its own parent. The replaced individual is removed from the population, and 
thus, the deme size is kept constant. We assume that an individual reproduces at a 
rate of Aq and replaces an individual from the same dame. In addition, in the migration 
process, alleles Ai and A2 replace an individual in the other demes at rates Aq^i and Ao^2, 
respectively. The offspring will have the same type as the parent with probability 1 — u 
and will have the other type with probability u, u e [0, 1]. We set ^1 = m(l — 6)/(n — 1) 
and ^2 = 'm{l + 6)/(n — 1) with < 6 < 1 and m > 0. The migration event is biased 
if 6 > 0. The state of the population at time t can be represented as a continuous-time 
Markov chain Z(t) = {Zi(t)), i = l,2,...,n, where Zi{t) is the number of individuals of 
type Ai in the i-th deme at time t. If Z{t) = z, Zi = 0, 1, N, the transition to z + Cj, 
where e,, i = 1, 2, n is the unit vector, occurs at a rate of 

(2.1) +Ao{(iV - Zi) + ^2 X (iV - Zk)}^^u. 

km 

and the transition to z — occurs at a rate of 

Ao{(iV - Zi) + ^2 X (iV - zfc)}|(l - u) 

(2.2) + ^k)^u, 

Thus, Z{t) is a n-dimensional birth and death process with nonlinear birth and death 
rates. We consider the limiting diffusion approximation of the model described avobe. 
We set Ao = N/2 and assume that Nu — 9 and Nm — 7 as A/" — 00. Xi{t) denotes 
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the fraction of genes of type Ai in the i-th deme in the hmiting process at time t. The 
generator of the diffusion process {X{t); t > 0} in an n-dimensional cube [0, 1]" is 



Here, x is the arithmetic mean of x. The Hmiting diffusion also appears as the Hmit of 
the discrete-time Wright-Fisher model, but the continuous-time Moran model is suitable 
for deriving the genealogical process. 

Coincidentally, the diffusive limit for the n-island model is equivalent to that for a model 
of gene conversion among members of a multigene family consisting of n unlinked genes, 
where Xi{t) denotes the fraction of genes of type Ai in the i-th locus of the n-gene family. 
We now consider a monoecious panmictic population consists of N diploid individuals. The 
population evolves according to a continuous-time Moran model with revertible mutation. 
The Moran model is a continuous-time Markov process in which haplotypes, whose alleles 
at each locus are chosen randomly from the population, produce one offspring at a time. 
The type of the offspring will be chosen according to the mutation and the conversion 
process. The offspring will then replace a haplotype chosen at random from the population. 
The offspring may replace its own parent. The replaced individual is removed from the 
population, thus, the population size is kept constant. We assume that a haplotype 
reproduces at a rate Aq and replace a haplotype of the population. Let c be the rate at 
which a gene is converted by any one of the other ra — 1 genes with equal likelihood. Only 
a subset of the total conversion events involve different alleles. Among such conversion 



(2.3) 



L = Lq- bLi, 



where 
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events involving different alleles, let (1 + 6)/2 be the fraction of these event that result in 
allele Ai being converted by allele A2, and similarly, let (1 — b)/2 be the fraction of the 
events that result in allele A2 being converted by allele Ai , < 6 < 1 [3l [T7] . We call the 
conversion event is biased if 6 > 0. The rate that an allele Ai {A2) is converted by an 
allele A2 (Ai) is c(l + 6)/(n — 1) (c(l — 6)/(n — 1)). For example, when n = 3, a haplotype 
A1A1A2 changes to A1A2A2, A2A1A2, AiAiAi at rates c(l + 6)/2, c(l + 6)/2, and c(l-6), 
respectively. For each locus, the offspring will have the same allelic type as the parent with 
probability 1 — u and will have the other type with probability u. We present haplotypes 
using binaries, where the i-th digit is 1 and when the locus is occupied by alleles Ai and 
A2, respectively. The state of the population at time t can be represented as a continuous- 
time Markov chain W{t) = (Wa{t)), where Wa{t) is the number of haplotypes of type a 
in the population at time t. If W{t) = w, Wa = 0, 1,...,2A^, the transition to w + 
occurs at a rate of 

(2-4) AoU7a^^^^^(l - J2 Q"/3)+^0 '^l3—^j;f^Ql3a 

and the transition to w — occurs at a rate of 

(2-5) '^"^"^ + W/S^il-Qf^a), 

where Q^^ is a rate at which a haplotype a changes to a haplotype /?. For example, 
Qiio,ioo = c(l + 6)/2 + n(l — n)^. We consider the limiting diffusion approximation of 
this model. We set Xq = N and assume that 2Nu 9 and 2Nc — > 7 as ^ 00. Xi(t) 
denotes the fraction of genes of type Ai in the i-th locus in the limiting process at time t. 
We have 

(2.6) ^ = f[{X,{t)r{l - X,(t))i--% 

where is the i-th digit of a. Then, the generator of the diffusion process {X(t); t > 0} in 
an n-dimensional cube [0, 1]" is exactly identical to that in Eg. 12.31 The limiting diffusion 
also appears as the limit of the discrete-time Wright-Fisher model. 
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The probability of fixation of allele Ai whose initial frequencies p, it{p), satisfies a 
partial differential equation 

(2.7) L7r(p) = 

with 9 = and a boundary conditions 7r(0) = 0,7r(l) = l,7r|5 = finite, where S = 
d[0, 1]" — {0, 1}. For details on the boundary conditions, see [16]. We expand the solution 
as 

(2.8) 7r(p) = 7r(0)(p) +67rW(p) + 0(62). 
Substituting Eq. 12.81 into Eq. 12.71 we obtain 

(2.9) ^(°)(P)=P, 

2 

(2.10) 7r(^^(p) = -p{n - 1 + n-f{l-p)} + - PiPj. 

n ^-^ 

It is straightforward to confirm that Eqs. 12.9112. lOl are valid by substituting them into 
Eq. 12.71 and the boundary conditions; see Appendix for the derivation. We have 

(2.11) TT (I) ^ -L{1 - (n - 1)(1 + 7')^ + 0(6^)}, 

as — > cxD, where 7' = 71,7/(72 — 1). In terms of the model of gene conversion among 
members of a multigene family, this expression agrees with that in the weak conversion 
limit (7 0, Eq. 8 in [18j). The effect of bias on the probability of fixation still remains 
under the weak conversion limit (7 0). The weak conversion limit is different from 
the case that c = 0. In the weak conversion limit, all loci are monomorphic except for 
very short periods of time when a polymorphism is segregating at a single locus. Genetic 
drift causes the segregating locus to become fixed, either for the introduced allele or for 
the original allele. After some long length of time, biased conversion creates another 
polymorphic locus, and the process continues until all loci are fixed for the same allele. 
Since the locus-by-locus spreding process is biased, the effect of bias should present in the 
probability of fixation. Note that, the spreading process is impossibe for the case that 
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c = 0. Walsh [l8j showed that if selection is weak, a slight conversion bias can alter the 
probability of fixation. When 7 is large, the effect could be significant, as shown recently 
by simulations [16j. Eq. 12. Ill shows that under neutrality the deviation of the probability 
of fixation from that without bias is proportional to one plus the number of converted 
genes. n^ei/N) yields the substitution rate. Since time is measured in units of Aq = N/2 
birth events, that is 

(2.12) ^ X nipo) = |{1 - (n - 1)(1 + 7')^ + 0{b')}. 

3. A BIASED VOTER MODEL AND THE ANCESTRAL BIAS GRAPH 

Krone and Neuhauser showed that the continuous-time Moran model with selection 
and mutation can be formulated in terms of the biased voter model with mutations on a 
complete graph [TS]. The above introduced continuous-time Moran model with n-demes 
under biased migration also has an alternative formulation in terms of the biased voter 
model on a set of complete graphs. Let / = (/j), Ij = {1, 2, A^}, i = l,2,...,n denote 
sets of sites, where li is the set of sites in the i-th graph. The biased voter model with 
mutation and biased migration on the set of graphs is a continuous-time Markov process 
whose state at time t is denoted hy rjt : I ^ {1, 2}. If x G /j, r]t{x) = 1 (2), then we say 
that X is occupied by an individual of type Ai {A2) at time t. The process {rjt; t > 0} 
evolves according to the following rules: (i) For x = 1,2,..., N and i = 1,2,..., n, the 
individual at x £ !{ produces an offspring at rate of Aq within Jj ; (ii) The offspring has the 
same type as the parent with probability 1 — u and has the other type with probability 
u; (iii) For x = 1,2,...,N, j / i; i,j = 1,2, ...,n, the individual at x £ Ii produces an 
offspring in Ij at rates depending on the allelic type. If r]t{x) = 1 (2), the rate is AqCi 
(Ao'^2); (iv) At the time when the birth event occurs, one of the N sites is chosen at random 
and the individual at this site is replaced by the offspring. (The offspring is allowed to 
replace its own parent.). We assume that £,2 — £,1 = 2mb/{n — 1). The birth event in the 
biased voter model is hierarchical. Two types of birth events, namely, events within a 
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graph and between graphs, are considered. A bias is present only if a birth event involves 
different graphs. 

The process can be constructed using a percolation diagram \15\ I19j . The idea is 
to construct the process using a collection of independent Poisson processes by drawing 
arrows on the space-time coordinate system / x [0, oo). These arrows indicate where and 
when the offspring is produced as well as it is sent. We begin by connecting arrows to each 
timeline at the times of arrivals in a Poisson process that describes the birth process. For 
each (x, y) G if, i = 1,2, n, let {W^f; s > 1} denote the times of arrivals in a Poisson 
process with rate Xq/N. For each {x,y) G li x ^ = 1,2, ...,n, let {Z^'j^^; s > 1} 
denote times of arrivals in a Poisson process with rate Ao6/^^- Let {U^ f,.; s > 1} and 
{V!^f; s > 1}, i ^ j, i,j = 1, 2, n be sequences of independent, uniformly distributed 
random variables in (0,1). For times W^'J^ we draw an arrow from x li to y li 
to indicate the birth of an offspring at x that is sent to y. For times Z^'J^^ we draw an 
arrow from x G Ij to y € Ij to indicate the birth of an offspring at x which is then sent 
to y. If {7fj^5 < (,i/S,2, we place a "(5" at the tip of the arrow; otherwise, we label the 
arrow with a "2". In other words, we have (5-arrows and 2-allows entering a site y at 
rates Aq^^i and Ao(^2 — '^i), respectively. Then, the following rule will apply: 2's can give 
birth through both types of arrows, but I's can only give birth through (5-arrows. The 
process {V!^f; s > 1} is used as the mutation process; if V^f < u, a mutation occurs. We 
represent a mutation event by solid dots on the arrows. A realization of the percolation 
diagram in the case n = 2 and = 4 is shown in Fig. 1. Ii and I2 are the left and the 
right graph, respectively. If the set of I's initially is {1} G I2, then, at time t, the set 
of I's is {2} G Ii and {3} G l2- The paths of the I's are indicated by thick lines. By 
reversing time, we can follow the ancestral history of individuals at sites in a finite set and 
thus determine their types. The resulting process is called the dual or ancestral process 
\10\ [T5\ I20|. A realization of the dual process, which was obtained from Fig. 1 by simply 
reversing time and the direction of arrows, is shown in Fig. 2. Here, the ancestral history 
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of a sample consists of individuals at sites {1,2} G Ii, and {1} G h at dual time is 
indicated by thick lines. 

To obtain the analogue of the coalescent, we rescale time and the parameters as in 
the previous section: Aq = N/2, ^2 = 1^(1 + 6)/(n — l),^i = m(l — b)/{n — 1) with 
Nu — 9 and Nm j as N ^ oc. The dynamics of the dual process follow readily 
from the percoration diagram. We can ignore the event in which a particle in the dual 
process crosses an unmarked arrow and lands either on a site not present in the dual 
process or on its own site. To describe the other possible events, assume that there are 
k = {ki),i = 1,2,..., n particles in the dual process, where ki is the number of particles 
contained in /j. We say that a coalescing event has occurcd when a particle crosses an 
unmarked arrow and lands on the site of different particle contained in the dual process. 
This occurs at rates 

(3.1) AoA;,^ = M^^, i = l,2,...,n. 

We say that a migration event has occured when a particle crosses a (^-arrow. This occurs 
at rates 

fl - bh 

(3.2) Xo^MN - kj) ^ l^^J-^^ ki, N^oo, 

for j 7^ i; i,j = 1, 2, n. We say that branching event has occured when a particle crosses 
a 2-arrow. This occurs at a rare of 

(3.3) Ao(6-6)fci(iV-%)^^fci, iV-^cx), 

for i 7^ = 1,2, ...,n. The original particle continues along the old path (continuing 

branch) and the new particle that arose from the branching follows the 2-arrow (incoming 
branch). If the new particle lands on a site that is already contained in the dual process, 
we say a collision has occurred, but collisions can be ignored in the diffusive limit N ^ 00. 
We call the random graph generated by the dual process in the diffusive limit as the 
ancestral bias graph. 
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Let An{t) = {An^i{t)), i = 1,2, ...,n denote the number of particles that are present 
in the dual time at t > 0, where An^i{t) is the number of particles contained in Jj. We 
set An{0) = n. The size process is an n-dimensional birth and death process. If 

A.n{t) = k, the transition to + ej is at a rate bj{nk — ki)/{n — 1) and the transition to 
fc — ej is at a rate at ki{ki — 1) /2. Due to the moment duality between the birth and death 
process and the Wright-Fisher diffusion governed by the generator Eq. 12.31 the stationary 
measure of the birth and death process can be obtained by the probability of fixation in 
the Wright-Fisher diffusion Eq. 12.81 ^ was shown in the ancestral selection graph [21]. In 
particular 



for i 7^ j; i,j = 1,2, Other configurations are 0(6^). In the weak migration limit 

(7 ^0), the configurations that have non-zero probabilities up to 0(6^) are Cj and ei + ej. 
In the weak migration limit, compared with waiting times for migration and branching 
events, a waiting time for coalescing event is neglibible. Thus, the state should be whether 
an ancestor is in a deme or two ancestors are in different denies, where one of which was 
produced by a recent branching event, and they are waiting for being in the same deme 
and coalescing. To simulate a sample, we may stop the process at the time at which the 
total size of the dual process reaches 1, since the types of the particles present at this time 
determines the types in the sample. We call the particle at this point in time the ultimate 
ancestor. As for the ancestral selection graph [15], branches in the ancestral bias graph 
do not necessary represent the true genealogy. Depending on the type of the ultimate 
ancestor and the mutation events along the branches, certain parts of the ancestral graph 
may not be accessible to individuals since only individuals of type A2 may cross 2-arrows. 
By following the path in the backward direction to the ultimate ancestor, we obtain the 



(3.4) 




(3.6) 



(3.5) 



^{2ei)=b^ + 0{b^), 
n 

<^(e, + e,-) = 6-(l+7) + 0(&'), 
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ancestral paths of each individual and hence the true genealogy of the sample. The true 
genealogy depends on the type of the ultimate ancestor. In Fig. 2, if the type of the 
ultimate ancestor is Ai, the true genealogy contains the dotted line and does not contain 
the dashed line, and vice versa. 

Griffiths and Tavare [22] introduced an importance sampling algorithm for computing 
the probability distribution for samples taken from a population that evolves according to 
certain neutral models. The algorithm is based on a recursion satisfied by the sampling 
distribution. A scheme using Markov chain Monte Carlo to simulate backwards along 
the sample path of the ancestral bias graph can approximate the sampling distribution of 
our model by use of similar recursions for the neutral cases. If / = (fi), i = 1,2, ...,n 
genes are taken from the i-th deme, where Oj and di genes are of type Ai and of type A2, 
respectively, we say that the sample is of type configuration (a, d). Let X be distributed 
according to the stationary distribution. We denote by q{a, d) the probability that a 
sample of / genes taken from a population in equilibrium is of type configuration (a, d). 
Then, it follows that 



(3.7) 




The probability g(a, d) satisfies a recursion. 



Theorem 3.1. The probability q{a,d) satisfies 



n 



r(a, d)q{a, d) = J] - l)/ig(' 



a — 



Bi, d) + {di - l)fiq{a, d - Cj)} 



i=l 



n 



a-ei,d + Si) + (oj + l)q{a + Cj, d - Bi)} 



i=l 




q{a - ei + ej,d) + 




di + 1 



q{a, d- ei + Bj) 
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where 

n 

(3.9) r(a, d) = /.{(/. -l)+e + j{l + b)}. 

i=l 

The probabilities with negative arguments are zero. The boundary conditions are 

(3.10) q{ei,0) = p, q{0,ei) = l-p, i = l,2, ...,n 
where p is the probability that the ultimate ancestor is of type A\ . 

As for Theorem 5.2 in [15] for the ancestral selection graph, since it is straightforward 
to prove Theorem 3.5, we do not present the proof. Theorem 3.5 can be proved either 
by computing the moments or using the structure of the ancestral bias graph, p is the 
expected value of relative frequency of A\ in the diffusive limit under stationality. Although 
an analytic expression for p is not available, it is possible to simulate it by using coupling 
from the past pl[25] . 

In principle, we can obtain probability of any sample by using the importance sampling 
algorithm [22j, but we forcus on analytic expression for a sample of size two (|/| = 2). By 
solving Eq. 13.81 we have 

7Vn + g + (l + 2g + y)(g-b7) , ^,,2^ 
^) = 2{y/n + 2^(1 + 2^ + 70} + ^ ^' 

(3.11) 

o(e- + e . 0) - 7Vn + (l + 2g + y)(^-b7) ^ 2. 
g(e, + e„Uj- 2{y/^ + 20(l + 20 + 7')} + 

^^----)- y/.+T.(iV2. + y) ^^(^^)- 

(-^) ^(----■)- 2i7vi't;ar2riy)i ^^(^-)- 

for i 7^ j; z, j = 1, 2, n. q{0, 2ej), g(0, Cj + ej) are given by Eqs. I3.im3.12l respectively, 
by replacing b by (—6). It can be seen that the effect of conversion bias on the sampling 
distribution is not as large as the effect of bias on the probability of fixation Eq. I2.11j 
in the formula for the probability of fixation, the ratio of the correction term in 0{b) to 



15 

the term of 0(1) is proportional to the number of demes/loci. The sampling distribution 
reduces to that in the model without bias when mutation events dominate biased migra- 
tion/conversion events {be <^ u) as well as in including the weak migration/conversion 
limit. Ohta f23] defined the identity coefficients between members of multigene family. 
/ is the average probability of allelic identity, ci is the average probability of identity at 
different loci on one chromosome, and C2 is that of two genes taken from different loci of 
two homologous chromosomes. For unlinked loci, ci = C2- In terms of the island model, / 
is the average probability of gene identity for genes sampled from the same deme, while 
ci = C2 is that of two genes taken from different demes. We have 

(3.14) / = <7(2e„0) + g(0,2e,), 

(3.15) ci = C2 = q{ei + ej, 0) + g(0, + Cj). 

When 6 = 0, Eqs. EHEH] reduce to Eqs. 12 in [23]. 

4. The strong migration/conversion limit 

Nagylaki [13] established the strong-migration limit for a geographically structured 
population. Let X{t) = ^27=1 ^ii^) / i''^^) ^ which is the frequency of Ai in the entire 
population. Since the deme size is not altered in our n-island model, the effective size is 
the total size, and all effects of population subdivision disappear p3] . It is straightforward 
to verify the conditions (Eqs. 22) in [14j are satisfied, and we obtain the limiting diffusion 
of the continuous-time Markov chain Z{t) with m and n fixed, Nb — > /? as — > oo (strong 
migration limit) in the diffusion time units of n. The generator is 

(4.1) L = - [nmPx[l - x) - -(1 - 2x)| -. 

Eq. 14.11 is identical to that of the diffusion for mutation and selection, which has been 
extensively investigated; here, the mutation rate is nu and the selection intensity is 2nmb. 
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The probability of fixation of allele Ai , whose initial frequencies are p, is given by 

g2nm/3p 

(4-2) Ap) = g2nm/3 _ 1 ' 

which agrees with Eg. 12. 8i In the strong migration limit, the ancestral bias graph reduces 
to the ancestral selection graph, which has been extensively investigated [151 I21| . with 
the scaled selection intensity {a in [15j) is 2nmp. In the strong migration limit, the 
population becomes panmictic and branching events are dominate. The branching rate 
for each individual in the ancestral bias graph is nm/3 (note that time is measured in units 
of n), which is equal to the rate in the ancestral selection graph whose selection intensity 
is 2nm(3. By using the density of the unique stationary distribution (Wright's formula 
[26]), it is possible to obtain an analytic expression of q{a,d). 

5. Bias in ectopic gene conversion among multigene family 
An exon sequence (393 base pairs) of mouse histone H2A gene (single exon gene) is re- 



trieved from Ensembl release 53 (http:/ /www. ensembl.org/ ) and hypothetical family mem- 



ber genes are searched from the complete mouse genome build 37.1 with using BLASTN 
|27j . A minimum of 90% similarity to the reference sequence and 90% coverage of the fam- 
ily member genes were required. We obtain 36 sequences, whose GC content at the third 
codon position was 88.0%, which is significantly higher than average GC content in the 
mouse genome (42%) [28] . This high GC content is probably due to biased gene conversion 
among the histone gene family [6j . As long as the bias b is small, the moment estimates of 
the parameters by using the formula for the identity coefficients Eq. l3.15] without bias and 
that for the substitution rate Eq. 12.12] have an accuracy up to 0(6^). The actual process 
of conversion is likely to involve a piece of a gene. In the analysis below, a nucleotide 
site is considered. Here, c is the average rate at which the nucleotide converted by the 
homologous nucleotide of another locus belonging to the multigene family. We assume 
n = 36 and the substitution rate is 1.22 x 10~^ per site per generation. The substitution 
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rate was estimated by noting the sequence divergence between the mouse sequence and 
the homologous rat sequence at the third codon position (14.5%), that estimated mouse- 
rat divergence time (SSmiUion years [29]), and the average generation time of the rat 
(0.5years). We dichotomize nucleotide to AT/GC and set AT and GC nucleotides as allele 
Ai and A2, respectively. Under the assumption that the substitutions occurs symmetri- 
cally, 2/3 of the substitution occur between AT and GC and we have u = 0.813 x 10"^. 
Assuming rat effective size as N = 1.61 X 10^ [30], we have 9 = 2.62 x lO"''. Average 
identity coefficient at the third codon position in comparison between two sequences was 
C2 = 0.840. According to Eg. 13.151 we estimate c = 1.26 x 10^'^. (The estimate is robust 
to the dichotomy of nucleotides and the assumption of free recombination. Even if we use 
a formula for complete linkage with four allele model [23) . we have c = 1.00 x 10"''). From 
the substitution rate Eq. 12.121 the expected fraction of genes of type Ai at a site is 



ut ^ l-(n-l)(l+y)^ 



(5.1) r?(t) =r? + (r?(0) -r?)e-"', fj 

If the GC content reaches its equilibrium, fj = 0.880 and we have b = 0.0209. Backstrom 
et al. [31] showed that chicken HINTW gene family linked to the W chromosome seems to 
undergo gene conversion at a rate of c = (3-4) x 10~^. They reported that the GC content 
of the intron (55.3%) is significantly higher than that of introns of other W-linked genes 
(40.0%). If the GC content reaches the equilibrium, we have b = (6-8) x 10"'*. If the GC 
content does not reach the equilibrium, the actual bias could be larger. Recent estimates 
of scaled allelic conversion bias {ANb' , where the GC gametes produced by a heterozygote 
individual being given by (1 + b')/2 [3]) in high-recombination region are 1.7 or 0.60 in 
Drosophila [32l[33], and 12.7 in humans [33]. It is suggested that ectopic gene conversion 
is also biased, whose magnitude could be orders of magnitude larger than that in allelic 
gene conversion. The relatively strong bias in ectopic gene conversion is plausible; bias 
in allelic gene conversion is equivalent to genie selection whose intensity is b, while bias 
in ectopic gene conversion is equivalent to genie selection whose intensity is 2ncb, at least 
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in the strong migration limit Eq l4.1l For the mouse histone H2A gene family, the scaled 
conversion bias is 2nc/? = 0.122 and the effect of the bias is almost neutral. Since nc 
is sufficiently smaller than unity except for very large multigene family, relatively large 
conversion bias can be maintained in a population against puryfying selection. 

6. Discussion 

It was recently found that the diffusion models of genie selection in an n-island model 
and that of gene conversion among members of a multigene family in a panmictic pop- 
ulation are identical [16j. In this study, we have developed two continuous-time Moran 
models of bias in evolutionary mechanisms: an n-island model and a model of ectopic 
gene conversion among members of a multigene family in a panmictic population. The 
models are very different but have the same diffusive limit, where allele frequencies in 
each subpopulation of the n-island model are identical to allele frequencies in each locus 
of n-gene family. Nagylaki [3] showed that bias in allelic gene conversion is equivalent to 
genie selection. In contrast, it was shown that bias in ectopic gene conversion is equivalent 
to genie selection only in the strong conversion limit. The n-island model was formulated 
in terms of a hierarchical biased voter model, and the ancestral bias graph was introduced 
as an object generated by the dual process. As for the ancestral selection graph [15j, it 
is possible to compute the quantities of interest, such as the probability of genes being 
identical by descent and the time to the most recent common ancestor, by expanding the 
ancestral bias graph by the population-scaled migration/conversion rate (7). Due to the 
moment duality between the diffusion process and the biased voter model in the diffusive 
limit, we can study some properties of the ancestral bias graph from the probability of 
fixation in the diffusion process (See [H]), and the reduction of the ancestral bias graph to 
the ancestral selection graph was shown via the strong migration/conversion limit of the 
diffusion process. The model presented here is minimal, but the migration and conversion 
schemes can be extended in various ways. 
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In this study, we have investigated concerted evolution by biased gene conversion among 
members of a multigene family. Unequal crossing-over is the other possible mechanism of 
the converted evolution. Unequal crossing-over occurs fairly frequently between nonallilic 
homologous genes in a large multigene family, where large number (the order of 100 or 
even 1,000) of homologous genes are tandemly arranged on the chromosome |351l36j. When 
average shift of gene units is large, unequal crossing-over can be dominate [35]. Bias in 
gene conversion is considered to be caused by bias in repair of doublestrand breaks by 
recombinations. Since concerted evolution by unequal crossing-over does not involve the 
bias, unequal crossing-over is probably less biased. In addition, bias in concerted evolution 
of large multigene family will be difficult to evolve in a population, even if biased gene 
conversion operatetes. Since the scaled conversion bias is proportional to the number of 
loci, bias in large gene family will be deleterious and pushed out from a population by 
puryfying selection. 

Remarkably, bias in gene conversion and that in migration are mathematically equiv- 
alent phenomena in the diffusive limit. Recently, evidence of the large impact of biased 
gene conversion on gene substitution is accumulating. Although the data that demonstrate 
migration bias are still limited, we can speculate that there might be a slight bias in the 
migration rate associates with alleles. If the number of migrant is large, effect of migration 
bias is equivalent to genie selection. Since a gene conversion bias of a few percent could 
cause a substantial increase in the GC content, it seems likely that slight bias in migration 
has large impact on population differentiation and speciation in natural populations. If 
population dispersal involved gradual population subdivision, the effective size could have 
been reduced without a reduction in the census population size; the effective size could be 
reduced substantially by rapid fixation of alleles associate to people who migrate quickly. 
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7. Appendix 

The diffusion process {X(t); f > 0} in an n-dimensional cube [0, 1]" has exit boundaries 
at and 1. As t ^ oo finite probability mass remains only at these exit boundaries, and 
we have 

(7.1) lim E[Ai(t)] =7r(p), i = l,2,...,n. 

t— »oo 

Let fj.a{t) = X^^{t)] and expand the Laplace transform by a power series in b 

(7.2) i^ais) = i^i^\s) + bui^\s) + 0(62). 
At the 0-th order in b we have a system of equations 



(7.3) {s + 7')i^i7 - - E ^§ = ' = 1' 2' ^- 

i=i 

The solution is 

(7.4) .i°)(.) = ^ + ^, . = l,2,...,n. 

S 5 + 7 

Consult [16] for the derivation. By applying the inverse Laplace transform, we have 
l^^^i = P ~'r {pi — p)e~"''^. Thus, TT^^\p) = p. In the same manner, for i = 1, 2, n. 



n 

and for i / j; i,j = l,2,...,n, 

(7.6) (. + 2,>™ - t + e..) = 



k=l 
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They can be solved for and i^ei+Cj - ^'^^ order in b, we have a system of 

equations for i = 1,2, ...,n. 



n ^ 



(7.7) = ^ <; 

n 



n 



Substituting Eq. 17.41 and solutions for Eqs. 17.5117.61 into Eq. 17.71 we have 
(7.8) .«(.) = ^ + 5:^, ^ = X,2,...,n, 

where 



• 1 - 



(7.9) ao = (n-l)p{l + y(l-p)}-- VpiPi- 

Kj 

Sj are eigenvalues of the generator Eq. 12.31 and aj, j ^ are constants which do not 
depend on s. Then, by applying the inverse Laplase transform, Eq. 12.10] follows. 
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Figure 1 . The graphical representation for the biased voter model for the 
case n = 2 and N = A. If initially the set of ^I's is {1} G then at time 

t, the set of yli's is {2} € Ii and {3} € /2- Ii and I2 are the left and the 
right graph, respectively. The paths of Ai's are indicated by thick lines. 
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Figure 2. The graphical representation for the dual process of the biased 
voter model. Here, the ancestral history of a sample, consists of individuals 
at sites {1,2} € Ii and {1} G I2 at dual time is indicated by thick lines. 
The ultimate ancestor is in {1} G /2, at dual time t. If the ultimate ancestor 
is Ai, then the true genealogy contains the dotted line and does not contain 
the dashed line, and vice versa. 



